The majority of online display ads are served through real-time bidding (RTB)--- each ad display impression is auctioned off in real-time when it is justbeing generated from a user visit. To place an ad automatically and optimally,it is critical for advertisers to devise a learning algorithm to cleverly bidan ad impression in real-time. Most previous works consider the bid decision asa static optimization problem of either treating the value of each impressionindependently or setting a bid price to each segment of ad volume. However, thebidding for a given ad campaign would repeatedly happen during its life spanbefore the budget runs out. As such, each bid is strategically correlated bythe constrained budget and the overall effectiveness of the campaign (e.g., therewards from generated clicks), which is only observed after the campaign hascompleted. Thus, it is of great interest to devise an optimal bidding strategysequentially so that the campaign budget can be dynamically allocated acrossall the available impressions on the basis of both the immediate and futurerewards. In this paper, we formulate the bid decision process as areinforcement learning problem, where the state space is represented by theauction information and the campaign's real-time parameters, while an action isthe bid price to set. By modeling the state transition via auction competition,we build a Markov Decision Process framework for learning the optimal biddingpolicy to optimize the advertising performance in the dynamic real-time biddingenvironment. Furthermore, the scalability problem from the large real-worldauction volume and campaign budget is well handled by state value approximationusing neural networks.
展开▼